Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons

نویسنده

  • I. Dan Melamed
چکیده

This paper shows how to induce an N-best t r ans la t ion lexicon f rom a bi l ingual t ex t corpus using s ta t is t ical proper t ies of the corpus toge the r wi th four ex te rna l knowledge sources. The knowledge sources are cast as filters, so t h a t any subset of t h e m can be cascaded in a un i form framework. A new object ive evaluat ion measure is used to compare the qual i ty of lexicons induced wi th different fi l ter cascades. The best f i l ter cascades improve lexicon qual i ty by up to 137% over the plain vanilla s ta t i s t ica l m e t h o d , and approach h u m a n performance . Dras t ica l ly reduc ing the size of the t ra in ing corpus has a much smaller impact on lexicon qual i ty when these knowledge sources are used. This makes it pract ical to t ra in on small hand-bu i l t corpora for language pairs where large bil ingual corpora are unavailable. Moreover , t h ree of the four filters prove useful even when used wi th large t ra in ing corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Semi-Automatic Acquisition of Domain-Specific Translation Lexicons

We investigate the utility of an algorithm for translation lexicon acquisition (SABLE), used previously on a very large corpus to acquire general translation lexicons, when that algorithm is applied to a much smaller corpus to produce candidates for domain-specific translation lexicons. 1 I n t r o d u c t i o n Reliable translation lexicons are useful in many applications, such as cross-langua...

متن کامل

Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation

We present new direct data analysis showing that dynamically-built context-dependent phrasal translation lexicons are more useful resources for phrase-based statistical machine translation (SMT) than conventional static phrasal translation lexicons, which ignore all contextual information. After several years of surprising negative results, recent work suggests that context-dependent phrasal tr...

متن کامل

Automatic construction of translation lexicons

The paper describes a statistical approach to automatic extraction of translation lexicons from parallel corpora. We briefly describe the pre-processing steps, a baseline iterative method, and the actual algorithm. The evaluation for the two algorithms is presented in some details in terms of precision, recall coverage and processing time. The comparison with other works shows that our method i...

متن کامل

A Cheap and Fast Way to Build Useful Translation Lexicons

The paper presents a statistical approach to automatic building of translation lexicons from parallel corpora. We briefly describe the pre-processing steps, a baseline iterative method, and the actual algorithm. The evaluation for the two algorithms is presented in some detail in terms of precision, recall and processing time. We conclude by briefly presenting some of our applications of the mu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره cmp-lg/9505044  شماره 

صفحات  -

تاریخ انتشار 1995